Hybrid Dirichlet mixture models for functional data
نویسندگان
چکیده
In functional data analysis, curves or surfaces are observed, up to measurement error, at a finite set of locations, for, say, a sample of n individuals. Often, the curves are homogeneous, except perhaps for individual-specific regions that provide heterogeneous behaviour (e.g. ‘damaged’ areas of irregular shape on an otherwise smooth surface). Motivated by applications with functional data of this nature, we propose a Bayesian mixture model, with the aim of dimension reduction, by representing the sample of n curves through a smaller set of canonical curves. We propose a novel prior on the space of probability measures for a random curve which extends the popular Dirichlet priors by allowing local clustering: non-homogeneous portions of a curve can be allocated to different clusters and the n individual curves can be represented as recombinations (hybrids) of a few canonical curves. More precisely, the prior proposed envisions a conceptual hidden factor with k -levels that acts locally on each curve. We discuss several models incorporating this prior and illustrate its performance with simulated and real data sets. We examine theoretical properties of the proposed finite hybrid Dirichlet mixtures, specifically, their behaviour as the number of the mixture components goes to 1 and their connection with Dirichlet process mixtures.
منابع مشابه
Hybrid Parallel Inference for Hierarchical Dirichlet Processes
The hierarchical Dirichlet process (HDP) can provide a nonparametric prior for a mixture model with grouped data, where mixture components are shared across groups. However, the computational cost is generally very high in terms of both time and space complexity. Therefore, developing a method for fast inference of HDP remains a challenge. In this paper, we assume a symmetric multiprocessing (S...
متن کاملThe Dirichlet Labeling Process for Functional Data Analysis
We consider problems involving functional data where we have a collection of functions, each viewed as a process realization, e.g., a random curve or surface. For a particular process realization, we assume that the observation at a given location can be allocated to separate groups via a random allocation process, which we name the Dirichlet labeling process. We investigate properties of this ...
متن کاملThe Dirichlet Labeling Process for Clustering Functional Data
We consider problems involving functional data where we have a collection of functions, each viewed as a process realization, e.g., a random curve or surface. For a particular process realization, we assume that the observation at a given location can be allocated to separate groups via a random allocation process, which we name the Dirichlet labeling process. We investigate properties of this ...
متن کاملTopic Models over Text Streams: A Study of Batch and Online Unsupervised Learning
Topic modeling techniques have widespread use in text data mining applications. Some applications use batch models, which perform clustering on the document collection in aggregate. In this paper, we analyze and compare the performance of three recently-proposed batch topic models—Latent Dirichlet Allocation (LDA), Dirichlet Compound Multinomial (DCM) mixtures and von-Mises Fisher (vMF) mixture...
متن کاملTruncation-free Hybrid Inference for DPMM
Dirichlet process mixture models (DPMM) are a cornerstone of Bayesian nonparametrics. While these models free from choosing the number of components a-priori, computationally attractive variational inference often reintroduces the need to do so, via a truncation on the variational distribution. In this paper we present a truncation-free hybrid inference for DPMM, combining the advantages of sam...
متن کامل